Improve SQLite write speed, indefinite hang while analyzing large bundles (>1 GB) by UnityZappy · Pull Request #66 · Unity-Technologies/UnityDataTools

UnityZappy · 2026-04-13T21:55:57Z

Using UnityDataTools analyze command on a target folder containing large (>1 GB) Asset Bundles can hang indefinitely. This is due to two issues:

Asset Bundles containing many references to other Asset Bundles can trigger re-processing CRC checksums for referenced files. There are also database write operations while processing these references.
The CRC checksum calculation can request a significantly larger buffer size than needs, leading to expensive re-processing of file buffer parts for the checksum calculation.

This PR addresses these issues with the following changes:

Set DB connection journal mode to use Write-Ahead log to speed up insert transactions
Fix buffer offset bugs in CRC checksum calculation, use fast CRC fingerprint for external file references.

With these changes, processing large asset bundles goes from an indefinite (>24 hours) process to a 15 minute operation.

- Set DB connection journal mode to use Write-Ahea log to speed up insert transactions. - Fix buffer offset bugs in CRC checksum calculation, use fast CRC fingerprint for external file references.

Copilot

Pull request overview

Addresses a hang/performance cliff in UnityDataTools analyze on very large (>1 GB) Asset Bundles by reducing expensive CRC re-processing and improving SQLite write throughput.

Changes:

Fix UnityFileReader.ComputeCRC chunking/offset logic to avoid re-processing and incorrect sizing.
Speed up CRC handling of StreamingInfo/StreamedResource by using a lightweight external-stream fingerprint instead of reading large .resS blobs.
Attempt to switch SQLite to WAL journal mode to speed up insert-heavy workloads.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
UnityFileSystem/UnityFileReader.cs	Corrects CRC chunk iteration to advance offsets and handle final partial chunks properly.
Analyzer/PPtrAndCrcProcessor.cs	Replaces external resource reads with an offset/size/path fingerprint and fixes CRC sizing for managed-reference strings.
Analyzer/SQLite/Writers/SerializedFileSQLiteWriter.cs	Updates `PPtrAndCrcProcessor` construction to match new API; leaves an unused parameter.
Analyzer/SQLite/Writers/SQLiteWriter.cs	Adds a WAL pragma on connection open (but currently conflicts with Init.sql pragmas).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-22T20:05:26Z

+                using var walCommand = m_Database.CreateCommand();
+                walCommand.CommandText = "PRAGMA journal_mode=WAL";
+                walCommand.ExecuteNonQuery();


PRAGMA journal_mode=WAL set here is later overridden by Resources.Init (Init.sql currently ends with PRAGMA journal_mode = MEMORY;), so the connection will not actually run in WAL mode. To make WAL effective, either move the WAL pragma to after executing Resources.Init, or remove/adjust the journal_mode pragma in Init.sql to not override the desired mode.

It seems this is a good optimization to add for UnityDataTools, but according to the copilot analysis this doesn't actually apply it.

Copilot · 2026-04-22T20:05:26Z

    public void WriteSerializedFile(string relativePath, string fullPath, string containingFolder)
    {
        using var sf = UnityFileSystem.OpenSerializedFile(fullPath);
        using var reader = new UnityFileReader(fullPath, 64 * 1024 * 1024);
-        using var pptrReader = new PPtrAndCrcProcessor(sf, reader, containingFolder, AddReference);
+        using var pptrReader = new PPtrAndCrcProcessor(sf, reader, AddReference);


containingFolder is now unused after removing it from the PPtrAndCrcProcessor constructor call, which will trigger an unused-parameter warning and can mislead callers. Consider removing this parameter from WriteSerializedFile(...) (and updating its call sites) if it’s no longer needed.

Copilot · 2026-04-22T20:05:26Z

+    private static uint AppendExternalStreamFingerprint(uint crc32, long offset, int size, string filename)
    {
-        var slashPos = filename.LastIndexOf('/');
-        if (slashPos > 0)
-        {
-            filename = filename.Remove(0, slashPos + 1);
-        }
-
-        if (!m_resourceReaders.TryGetValue(filename, out var reader))
-        {
-            try
-            {
-                reader = new UnityFileReader("archive:/" + filename, 4 * 1024 * 1024);
-            }
-            catch (Exception)
-            {
-                try
-                {
-                    reader = new UnityFileReader(Path.Join(m_Folder, filename), 4 * 1024 * 1024);
-                }
-                catch (Exception)
-                {
-                    Console.Error.WriteLine();
-                    Console.Error.WriteLine($"Error opening resource file {filename}");
-                    reader = null;
-                }
-            }
-
-            m_resourceReaders[filename] = reader;
-        }
-
-        return reader;
+        crc32 = Crc32Algorithm.Append(crc32, BitConverter.GetBytes(offset));
+        crc32 = Crc32Algorithm.Append(crc32, BitConverter.GetBytes(size));
+        crc32 = Crc32Algorithm.Append(crc32, Encoding.UTF8.GetBytes(filename));


AppendExternalStreamFingerprint uses BitConverter.GetBytes(...), which is endianness-dependent. That makes the computed CRC vary on big-endian platforms even for identical data. Prefer writing offset/size in a fixed byte order (e.g., little-endian via BinaryPrimitives) before appending to the CRC to keep results stable across architectures.

Copilot · 2026-04-22T20:05:27Z

+    /// <summary>
+    /// Computes CRC32 over a contiguous byte range, reading the file in buffer-sized chunks.
+    /// </summary>
    public uint ComputeCRC(long fileOffset, int size, uint crc32 = 0)
    {
-        var readSize = size > m_Buffer.Length ? m_Buffer.Length : size;
-        var readBytes = 0;
-
-        while (readBytes < size)
+        var remaining = size;
+        while (remaining > 0)
        {
-            var offset = GetBufferOffset(fileOffset, readSize);
-            crc32 = Crc32Algorithm.Append(crc32, m_Buffer, offset, readSize);
-            readBytes += readSize;
+            var chunk = (int)Math.Min((long)m_Buffer.Length, remaining);
+            var offset = GetBufferOffset(fileOffset, chunk);
+            crc32 = Crc32Algorithm.Append(crc32, m_Buffer, offset, chunk);
+            fileOffset += chunk;
+            remaining -= chunk;
        }


ComputeCRC chunking logic was changed to fix offset/size handling; this is easy to regress without coverage. Add a unit test that verifies CRC results for ranges that cross the internal buffer boundary (e.g., size > buffer, and a final partial chunk) to lock in the corrected behavior.

SkowronskiAndrew

Thanks for submitting this proposed fix.

I think the nature of the fix needs to change a bit so that we don't lose the value of the CRC checks into those external streams.

But this is a good start - it sounds like we definitely need to do something to fix analyze for the data set you are looking at. There is no reason we shouldn't be able to calculate CRC across large data in a decent amount of time if we are efficient with RAM usage and access patterns. I wasn't familiar at all with this area of UnityDataTools but hope my suggests point in a useful direction.

SkowronskiAndrew · 2026-04-22T21:53:13Z

+                using var walCommand = m_Database.CreateCommand();
+                walCommand.CommandText = "PRAGMA journal_mode=WAL";
+                walCommand.ExecuteNonQuery();


It seems this is a good optimization to add for UnityDataTools, but according to the copilot analysis this doesn't actually apply it.

SkowronskiAndrew · 2026-04-22T22:07:31Z


-    private UnityFileReader GetResourceReader(string filename)
+    /// <summary>
+    /// Extends CRC32 with a stable fingerprint for an external stream segment without reading blob bytes.


The CRC is stored in the database to represent the content of the object, including all external resource references like mesh and texture content. The CRC is the fingerprint for "has this asset changed?"

If a texture blob in a .resS file changes but stays the same size, the old code detects it (different bytes =
different CRC)

The new code gives an identical "fingerprint" (same offset + size + filename)

It sounds like there are performance problems in the existing calculation but i don't think this is the solution.

Probably there are two things to do:

Continue to calculate the CRC of .resS and .resource blobs, but make sure we do this in an efficient fashion. I'll take a bit more look at the existing code.

Offer a way to skip the CRC calculation if you don't need it. Currently you can turn it off with --skip-references, but that is not intuitive, so there should be a separate --skip-crc and some careful work to make sure that the permutations of those flags work properly.

SkowronskiAndrew · 2026-04-22T22:20:30Z


    private CallbackDelegate m_Callback;

-    private Dictionary<string, UnityFileReader> m_resourceReaders = new();


I'm wondering if this existing implementation was very expensive in terms of allocating a lot of memory during the processing and never freeing it. Is it possible you were running out memory?
Apparently each UnityFileRead allocates 4MB and i guess it might grow based on the actual data size?

I'm guessing that the way to fix things will be to improve the management of these UnifyFileReaders.
I think these should be flushed between SerializedFiles - currently .resS and .resource files are NEVER shared, its
always pointing at another companion file with the same filename root.

So the only purpose of caching would be while processing different objects of the same serialized file (because they might reference different segments of the same file)

I recommend that you to take a look at the access pattern, e.g. using some logging to double check exactly what requests are coming in for the external streams in the case that was so slow.

Improve SQLite write speed for large bundles.

b448c63

- Set DB connection journal mode to use Write-Ahea log to speed up insert transactions. - Fix buffer offset bugs in CRC checksum calculation, use fast CRC fingerprint for external file references.

SkowronskiAndrew requested a review from Copilot April 22, 2026 20:01

Copilot started reviewing on behalf of SkowronskiAndrew April 22, 2026 20:01 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

SkowronskiAndrew requested changes Apr 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve SQLite write speed, indefinite hang while analyzing large bundles (>1 GB)#66

Improve SQLite write speed, indefinite hang while analyzing large bundles (>1 GB)#66
UnityZappy wants to merge 1 commit intoUnity-Technologies:mainfrom
UnityZappy:bugfix/large-asset-bundles-crc

UnityZappy commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

SkowronskiAndrew Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

Copilot AI Apr 22, 2026

Uh oh!

SkowronskiAndrew left a comment

Uh oh!

SkowronskiAndrew Apr 22, 2026

Uh oh!

SkowronskiAndrew Apr 22, 2026

Uh oh!

SkowronskiAndrew Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		private CallbackDelegate m_Callback;

		private Dictionary<string, UnityFileReader> m_resourceReaders = new();

Conversation

UnityZappy commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

SkowronskiAndrew Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

SkowronskiAndrew left a comment

Choose a reason for hiding this comment

Uh oh!

SkowronskiAndrew Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

SkowronskiAndrew Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

SkowronskiAndrew Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants